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ABSTRACT 

This study assesses the reliability of the System 
for the Analysis of Classroom Communication (SACC) devised to permit 
the gathering of data descriptive of classroom communication between 
teacher and pupils for evaluative purposes* The reliability used was 
called in ter- observe r agreement* The measure of inter-observer 
agreement used was the Scott coefficient which takes into account the 
number of categories in the system and the frequency with which each 
is used* The sample consisted of six schools, 20 teachers, eight 
subject matters, and eight grade groups., Students were of average 
socioeconomic status, most were "Anglos" but some were 
Mex lean- Americans* There were 33 sessions varying in length from 
seven to 34 minutes; a session being defined as a coherent curricular 
unit- The two observers, who coded at the same sessions, were 
advanced graduate students in education,. Results indicated that the 
level of inter-observer agreement was significantly high enough to 
permit use of the instrument for evaluative purposes* A modification 
of procedure should be used when the goal is evaluation of a school 
or a grade level. (EK) 
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I • INTRODUCTION 

The System for Analysis of Classroom Communication 
(SACC) v;as devised to permit the gathering of data de- 
scriptive of classroom communication betwee]i teacher and 
pupils for general evaluative purposes. It is clear 
from many educational studies of different sorts (c.g., 
Bond ^ Dykstra, 1967) that "something” in the classroom 
affects pupil achievement. It is suggested by a number 
of studies of classroom interaction that different kinds 
of communica L i on processes may result in different pupil 
outcomes. Hence it appears to be important to evaluate 
the classroom communication processes for any broad 
etraluation program. Many systems have been proposed for 
the purpose of describing classroom interaction, tv/enty- 
of v/hich are brought together in M'ittov s for B ehcivi-or 

jt 

_(Simon ^ Boyer, 1967). These and other systems were 

revicv;ed for possible application to the evaluation 

/ 

problem. Because of unreliability, incompleteness, 
complexity, or cost, however, none of them proved to be 
suitable. Upon tlie bases of both theory and empirical 
results, a new ins trument - - the SACC--v.'as devised, which 
was intended to be somewhat more analytical than the 
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simplest (and most used system- -Plander ’ s) , yet less 
costly than the m.ost complex. It has undergone five 
revisions, based upon experience in using tlie systein in 
live classrooms. This development and tlie justification 
£or it will be described in detail in a subsequent 
report. Attached is a copy of SACC , Form V (Appendix A). 

In the summer of 1969 an attempt v/as m.ade to assess 
tlie reliability of the instrument. Reliability is not 
a- simple concept in this sphere. It is obviously desir- 
able to have an observer capable of replicating his own 
coding, but this requires either typescript, audio tapes, 
or video tapes, and even the best of these provide less 
information than is available in tlie live classroom. 
Furthermore, the situations in whicli the permanent 
records are attained are ordinarily more constrained 
_than a normal classroom. The technique mentioned was 

used in training the coders, but estim.ates of relia- 

/ 

bility in live situations were desired. Here two 
alternatives present themselves: a given coder can 

code two sessions v/ith a single teacher and a single 
subject-matter, or two coders can code tlie same sessions. 
In the first case the question arises v/liether it is the 
teacher- cons istency or the observer- reliability that is 



largely responsible for discrepancies in the results. 
Although till literature seems to indicate strongly tliat 



t 

teachers do not (in fact, cannot) change their style 
significantly v/ithout intensive training, it is quite 
possible that tv/o very different sorts of lessons might 



occu.r--one where the pupils v/ere largely learning certain 
tools, and a second where they v/ere being encouraged to 
use the tools to arrive at new conclusions. Whereas such 
problems could be resolved v/ith the teachers, the result 
would be increased stress upon the teacher, or greatly 
increased observation time, neither of which is desir- 
able for an evaluation program. With proper sampling 
procedures, in a large-scale evaluation study, this 
would, in fact, create no difficulties, but it would 
for a briefer study of reliability. The second option, 
using tv/o coders at the same session, was chosen as 

being most economical. This type of reliability is best 

/ 

called inter - obs erver agreement. 



II. PROCEDURES 

A brief description of the coding process will be 
useful here (see SACC , Form V, attached). SACC is a 
category system 5 all communicative behavior can be coded 



into mutually exclusive categories. Tliere are 12 major 
dimensions, 5 referring to teacher beliavior, 5 referring 
to pupil behavior, and 2 which refer to eitlier or both. 



Within the major dimensions are varying ■ numbers of sub- 
categories, the number depending upon the kinds of 
distinctions tliat coders have been able to make, since 
the finer break-downs of earlier forms have proved 
unreliable. The total number of categories is 31, with 
four addit.i.ona.1 symbols used for special situations. 

Three of these last four are essential in studying inter- 
observer agreement, in order to keep tlie two records in 
step v/ith one another; the fourtli is a code-modifier 
to permit an estimate of the length of pupils’ contribu- 
tions. The system is committed to memory by the coder, 
and coding is practiced on several kinds of materials 
..until the codor can reproduce to a reasonab-1^ degr ee the 

’’master" coding and until his speed has increased to the 

/ 

point where he can code at classroom pace. 

Coding is done every 5 seconds, paced by a timer 
(see Apparatus Report) which actuates both a buzzer and 
a light, as well as displaying the number of the cell 
to be coded. If tliere is a cliange in major dimension 
within the 5-second interval, both codes are entered 
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in the Sciiue cell (occns i Dually three codes are re- 
quired) . The difficulty experienced in practice n'hen 
tv/o individuc'.l s judge the time at v/hicli events occur is 
a very old problem, going bach to 179C v.dien tlie Astroiiomer 
Royal of Britain dismissed his assistant for ’’errors” in 
observation of tlie transit of stars, wliicli led to a 
series of researclies on "prior entry.” (Boring, 19S0) 

The implication of this research is that the time at 
which an event is observed to occur is a function of that 
aspect of the situation receiving the observer’s atten- 
tion. In addition there is som.e work in tlie perception 
of language which indicates tliat the location of buzzes 
is often displaced to the beginning or end of certain 
kinds of psycliological and linguistic units. Add to 
these the differences in reaction- time and difficulties 
in identifying brief pauses v/hich occur at major syntac- 
tic breaks, and some variability in the exact temporal 
locatdon of beliavior codes is inevitable.' 

The measure of inter- observer agreement used Vs'^as 

the Scott coefficient (ir = ^o-^e , v/here P is the 

, 1-Pe ^ 

observed percent agreement and is that expected 

by cliance) . This index takes account of tho number of 
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categories in t]'ie system as v/ell as the frcc|uciicy v/itli 
v/hich eacli is used, 

III. CLASSROOM SAMPiriS 

The sample of classroom observations v/as far from 
ideal. Teachers are very anxious when they suspect tlieir 
teaching is being evaluated, even if informally and 
unofficially, and principals are at the moment loathe’ 
to even appear autocratic. Hence we had to be content 
with volunteers. In addition, the study used summer- 
session classes v/here there is much less pressure than 
in regular session, the pupils are in many cases volun- 
teers, and there are many multi-grade classrooms. There 
were 6 schools, 20 teachers, 8 subject-matters, and 8 
grade - groups ; all told there v/ere 33 sessions where the 
coding was independent. In some cases there were 

on individual teachers. Most of the 
m homes of average socio-economic status; 

" but there were some Mexican-Amer leans in 
es. The sessions varied in length from 
a session being defined as a coherent 
Table I shov/s the distribution of 
independent coding sessions. 
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repeated measures 
students came fre 
most were ’’Anglos 
many of the class 
1 to 34 minutes, 
curricular unit, 
sessions for the 
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IV . TRAI NING OF CODERS 

The two observers who participated in this study 
were both a-dvanced graduate students in education. One 
had participated in the developincnt oT tlie inscruiiient, 
and had consid-erable practice in coding cards bearing 
descriptions of single items of behavior, typescripts 
of classroom records-, audio tapes, and a few video 
tapes, as well as a few hours of i .ve classroom coding. 

The other observer had a crash course of about two 
weehs’ duration under the guidance of the first, 
entirely in the laboratory setting. Their eleven non- 
independent coding sessions can be consideied acditional 
training . ■ . 

~y. RESULTS 

The Scott coefficient (u) is shov.n in Table II for 
each independent session, in order of occurrence, to- 
gether with the grand mean, and the means .of successive 
thirds. The inter-observer agreement is about 75 percent 
The upper curve in Fig. 1 shows that there is very 
little change over the course of the study, although 
the first 10 sessions are slightly less reliable. 

The lower curve in Fig. 1 shows the total number of 
categories of behavior observed in each session; it is • 
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relatively constant at a fairly hig]i level. The mean 

number of categories coded is 22, with a range of 14 

t 

to 28. for single sessions. Table III shows the 
categories most frequently coded. 



VI ; DISCUSSION 

The level of inter-observer agreement is moderately 
high, certainly high enougli to be encouraging. Some of 
the sources of disagreement are known and are modifiable 
either by more rigorous control of training or by modi- 
fication of coding procedures. These are: 

1. Omissions: These represented 18 percent of the 

coded behaviors. A large proportion of these 
referred to brief beliaviors noted by one but 
not by the other observer: ,1^ (positive rein- 
forcement) is often a perfunctory nod, "yes'', 
O.K. , or ’’riglit", immediately followed by a 
question or instruction, v;hich dominates tJ\e 
5-second cell, so that the observer’s at-tention 
is drawn to a more cojiiplex decision process j 
(constructive silence) is often brief, 
v/hile awaiting a pupil’s reponse, so one 
observer may judge it a normal pause, similar 
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to a syntactic broalc beliween speakers, whereas 
the other may think it somewhat longer , to 
give the pupil a brief time for thought. 

2 ."- Systematic Observer Bias: Among these biases 

were sensitivity to certain categories of 
behavior and insensitivity to others, different 
criteria for memory vs. thought processes or 
other distinctions, and differential knowledge 
of expected, performance level of children of 
various grades (in judging whether a child's 
ansv/er was likely to be memory or thought). 

We have no measures for these biases, but 
analysis of the nature of the confusions could 
suggest measures. Such biases were prominent 
in training, where every effort was made to 
reduce them to a minimum. 

Other sources of disagreement are known but not easily 

/ 

dealt with. Among these are: 

1. Audibility of teacher and pupil voices:. This 
depends on ambient noise, classroom climate, 
acoustics, individual differences in voice and 
personality, and the observers’ acuity. If 
things are bad enough, one can discard the 
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2 . 



3 . 



4 . 



entire session, but ordinarily tliere ar.e 
several spots wliere the message is unclear. 
Sometimes the observer i/itli tlie better acuity 
hears it and the otlier docs not; often it is 
missed by both. 

Length of Observation: Short sessions are less 

reliable than long ones, but it is necessary 
to consider as a- session only a colierent. 
instructional unit. A lov/cr limit should be 
set for an acceptable length. This should be 
established empirically, but would probably 
be between 20 and 30 minutes. 

Use of a very small number of categories: 

This yields a high P^, hence it will be lov/. 

This c^ould occur in classes having special 
drill sessi ons , largely lecturing behavior on 
the part of the teacher, and rapid-fire ques- 
tion and ansv/er session, etc. 

Difficulty of synchronizing the timers of tv/o 
observers: This m.ight result in displacement 

of a code by as much as tv/o cells, particularly 
if one observer habitually codes early in tlie 
period and the other later. 
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5. Rate o£ interaction in the classroom; Some 

teaCiiers in certain situations Hl 0 .vc ac a veiy 
rapid pace; o^ucstion, call on student, ansv/ei , 
feedback, question, all within a 5-secon.d 
interval. This can be hard to keep up with, 
especially if the decision, about intellectual 
level of questions and answers is difficult. 

In rapid-fire situations, some events are 
inevitably omitted; also a change to a higher 
level of question after several lower level 
questions is likely to go unnoticed, unless 
■ an observer is particularly alert to that 

topic, and then one observer may be so alerted 

and the other not. 

Finally there are the confusions which still occur 
because of the difficulty of defining the liiriits of the 
categories so that everyone interprets them in the same 
way.^ The categories in which the greatest amount of 

disagreement occurred were: > ^ 2 ’ ^1 ’ ^4 ’ ^1 ’ 

is ordinarily very brief, as mentioned above. 3^ and 
32 could be defined more precisely, perhaps, but it is 
a matter of judgment whether the teacher is structuring 
the lesson or giving nev/ material, a judgment peculiarl) 



difficult for a sudden visitor to mahe. It could be 
corrected by interviev/ing the teacher a.nd observing for 
several days, but this is not a practical solution for 
an efficient evaluation instrument. It is difficult to 
see hov/ 5-j can be mistaken, but it is. Brief, immediate 
orders may be missed. It is also difficult to see hov; 

6^ can be mistaken, but there v;as , for these observers, 
some confusion with 6g and It is also possible that 

some responses are so brief ("yes" or a nod), that they 
get lost like the other very brief ones do. All these 
categories occur in the most frequent class (see Table 
-III) , and the frequency of disagreement is not great 
in any session (see Table IV) . As can be seen, for no 
category do any large number of sessions shov; disagree- 
ments; even for the categories v;here a relatively large 
■proportion of disagreements occur in one session, other 

sessions typically shov; a small or insignificant pro- 

/ 

portion of disagreements. 

Some categories never have any significant number 
.of disagreements. These are : 2^ (negative informational 
feedback), 6^ ("I don’t know"), 7 ^ (irrelevant remarks), 
8^ (practice small unit) , 82 (practice more complex 
unit), 9i (pupil -pos itiv^e evaluation), 0]^ (pupil -negat ive 
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evaluation), 33 (interruptions), (general noise and 

confusion), and (pupil’s misbel.avior) . They are 

eitlier quite clearly defined behavioral units sucli as 

and a^, or infrequent behaviors (65,73,8^,83,9^,0, 

and 34). For categories witJi only a single instance 

of significant disagreement, the same holds true: 2 , 

2 

bg, and bs are infrequent; 4 g (thought questions) was 
the subject of much training and the decision criteria 
were made specific; bp (pupil's question or statement 
regarding procedure) was a clearly defined behavioral 
unit, not easily confused with others. 



yil. CONCLUSIONS 

The inter-observer agreement is sufficiently good 
to permit use of the instrument for evaluation purposes 
The time necessary to train coders is not excessive; 
-two weeks of half-time work seems adequate. It is 
probably not necessary to use graduate-level personnel, 
but, as an alternative some teaching experience would 
probably be necessary. Clerks would probably not be 
trainable in any reasonable length of time. 

A modification of procedure should be introduced 
when the goal is evaluation of a school or a grade-level. 
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Instead of multiple codings in a single cell, a single 
code should be used, that for the behavior vdiich is 
occurring at the time signal. If the session is suf- 
ficiently long (approximately 30 m.in.), the sampling 
of beliavior at 5- second intervals will give a suf- 
ficiently accurate estimate of the important (i.e., 
frequently occurring) categories. In addition, a large 
source of disagreement (the very brief behaviors over- 
shadov/ed by the more time-consuming ones) will be 
minimized, and the observers will not be subject to as 
much. time stress as at present, itself a source of 
disagreement. There are special research and training 
problems for which tlie coding of behavior sequences is 
important, but for general evaluation purposes it is 
probably not necessary, and the suggested simplification 
-Will make both coding and analysis very much simpler. 



/ 
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Table 1: Distribution of Coding Sessions 

GRADE 



Subject 


N 


1-2 


2 1-2-3 


2-3 3-4 4-5 5-6678 


7-8 '2-5 -4-6 


Social 

Studies 


4 




1 


1 2 




Reading 


4 


3 




1 

1 




Math 


9 


1 




1.1 5 


1 


Language 

Arts 


8 


5 


♦ 


1 


1 1 


Science 


5 




1 


2 1 


, 1 


Fire 

Safety 


1 • 


/ 


. 


1 




Art 


1 








1 


-Foreign 

Language 


• 1 




1 


• 






Table II 



Inter- obs ervcr Asrccment for SACC, For)ii V'*'' 



ervation 


TT 


Observation 


Tl 


Observation 


TT 


1 


.64 


12 


.90 


23 


.81 


7i ' 


.70 


13 


. 80 


24 


.67 


3 


.82 


14 


.79 


25 


.74 


•4 


.55 


15 


. 83 


26 


.65 


5 


■ .55 


16 


. 73 


27 


.85 


6 


.82 


17 


.79 


28 


.77 


7 


.75 


18 


.75 


29 


.88 


8 


.83 


19 


.69 


30 


.59 


9 


\74 


20 


. 86 


31 


.77 


10 


.62 


21 


. -.■82 


32 


.79 


11 


.74 


22 


.75 


33 


.84 




4^ 


Total : 

i 






> 


; 




? = .75 












R = .55-. 90 









T\ 

First 10 Observations: 

? = .70 

Second 10 Observations: 
? = .79 

Last 13 Observations: 

TT 
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.76 



TABLH III 



Observed Frequency of Categories 



Most Moderately 

Coded Coded 




Most: X ^ 15% 

Moderate: 5% < x < 15% 

Least: 1% £ x < 5% 

I 

Rare : x < 1^ 



Least Rarely 

Coded Coded 




Average % of codings per obser 
vation for tvio observers 



* Independent sessions only 
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* Prepared by Margaret Hubbard JorieS/ 
and, K. Olivier, July, 1969. 
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System for the Analysis of Classroom Communication (SACC) -Form V (Short Form) 
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